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ABSTRACT 


A  slight  modification  of  Algorithm  229  is  described  to  compute  the  dual  regression  quantile  statistics 
which  are  essential  to  the  construction  of  the  regression  rank  score  statistics  introduced  by  Jureckova 
and  Gutenbrunner  [1].  The  latter  statistics  appear  to  generate  a  natural  new  approach  to  rank 
estimation  and  testing  for  the  linear  model. 


1.  Introduction 

In  [5]  we  described  an  algorithm  to  compute  the  "regression  quantile"  statistics  intro- 
duced in  [4].  The  algorithm  employs  parametric  linear  programming  to  solve  the  problem 

n 

minY,  Pt(yi-Xi'b)  (P) 

6€BP,=i 

where  Pt{ii)  =  t  \u  \'^  +  (l-t)\u  \~.  The  corresponding  dual  problem 

fi 
max  X;  J',- a.-  (D) 


a€A 


1=1 


where  A  =  {a £[0,1]"  |  X'a  =  (l-t)X'l},  has  recently  been  studied  by  Jureckova  and  Guten- 
brunner[l]  and  Jureckova,  Gutenbrunner,  Koenker  and  Portnoy[2]  and  appears  to  be  of  sub- 
stantial independent  statistical  interest.  Indeed  the  dual  solution  a„(0  serves  as  a  foundation 
for  a  new  theory  of  linear  rank  statistics  for  the  linear  regression  model.  In  this  note  we  will 
briefly  describe  some  (slight)  modifications  of  our  original  algorithm  required  to  compute 
Unit )  and  sketch  briefly  how  it  might  be  used. 

Recall  that  a  solution  o„(0  to  the  (primal)  problem  (?)  is  computed  first  at  the  arbitrary 
point  /  =  l/n,  and  subsequently  the  algorithm  finds  a  sequence  of  breakpoints  /,:  /  =  l,2,...,y„ 
at  which  the  solution  vector  o„(/)  is  modified  by  one  simplex  pivot  operation.  The  sample 
path  b„{t)  is  piecewise  constant  with  jumps  at  these  breakpoints.  (Portnoy[7]  has  recently 
shown  that  EJ^  =  O{nlogn)  as  /i— »oo,  under  mild  regularity  conditions  on  the  process  gen- 
erating X  and  y.)  In  contrast,  the  components  of  a„(0  are  piecewise  linear  and  continuous 
with  kinks  at  the  points  where  o„(/)  jumps.  To  describe  the  computation  in  more  detail,  we 
must  recall  some  further  notation  from  [4]. 

Let  h  denote  the  index  set  of  "basic  observations"  at  /  such  that 

r,(/)  =  .v, -x.'^„(/)  =  0      ieh 

Baring  degeneracies,  there  are  exactly  p  elements  of  h  at  each  t  at  which  the  solution  is 
evaluated.  By  complementary  slackness, 

^  (\   if  ri(t)>0 
^»(' )  =  |o  if  ri(t)  <  0 

so  partitioning  a  =  (0^,0;^)'  we  have  from  the  dual  equality  constraint, 

X^a^,  =il-t)X-\-Xf^ai: 

where  X  has  been  reordered  and  partitioned  to  conform  with  a.  Fortuituously  however,  a  is 
exactly  what  the  tableau  in  the  primal  problem  computes  to  test  the  optimality  of  ft„(/). 
Recall  the  famous  maxim  of  linear  programming:  "dual  feasibility  implies  primal  optimality." 
Thus  to  obtain  a^U)  we  need  only  to  create  an  appropriate  array  and  copy  the  appropriate 
values  to  it  from  the  tableau  of  the  primal  problem.  Details  are  provided  in  the  following  sec- 
tion. 

2.  Implementation 

To  modify  the  algorithm  we  have  added  the  array  DSOL  to  the  calling  sequence  declar- 
ing it  REAL  DSOL  (M,NSOL)  and  in  place  of  the  the  statement 

330   CONTINUE 
we  have  inserted  the  statements: 


KD=ABS (WA (M4 , I ) ) -N 

DSOL ( KD , LSOL) =ONE+WA (Ml , I ) /TWO 

IF(WA(M4 , I) . LT. ZERO) DSOL (KD, LSOL) =ONE-DSOL(KD, LSOL) 
330  CONTINUE 

DO  335  I=KL,M 

KD=ABS ( WA ( I , N2 ) ) -N 

IF (WA ( I , N2 ) . LT . ZERO) DSOL (KD , LSOL) =ZERO 

IF (WA ( I , N2 ) . GE . ZERO) DSOL ( KD , LSOL) =ONE 
3  35  CONTINUE 

and  we  have  replaced  the  group  of  statements 


DO  350  1=2, LSOL 
S0L(1,I-1)=S0L(1,I) 
3  50  CONTINUE 

LS0L=LS0L-1 


with  the  statements 


DO  350  1=1, M 
DS0L(I,1)=0NE 
DSOL(I,LSOL)=ZERO 
3  50   CONTINUE 

S0L(1,1)=ZER0 


One  consequence  of  the  latter  substitution  is  to  add  a  redundant  colunrm  to  the  array  SOL 
which  contains  the  primal  solution:  the  last  p  +  l  elements  of  the  last  2  columns  are  now  ident- 
ical and  contain  the  vector 

for  te[tj^,l]  where  y„  is  SOL(  1 ,  LSOL-1).  The  columns  of  the  new  array  DSOL 
correspond  to  the  evaluation  of  a^U)  at  /=  S0L(1,I),  1=1, LSOL.  Since 
SOL(1,1)=0  and  SOL  (1,  LSOL)  =1,  the  first  and  last  column  of  DSOL  are  ones  and 
zeros  respectively. 

3.   An  Example 

The  use  of  the  resulting  array  is  illustrated  by  the  computation  of  the  VVilcoxon  scores 
which  in  the  present  notation  would  be 

s,  =  Ja^U  )dt  =  %(£  (a„(/y+i)+5„  (/, ))(/,+!-/,  )-l).  (3.1) 

0  y=i 

Since  <$„(0  is  piecewise  linear  the  trapezoidal  rule  is  exact.  In  the  location  model  where  the 
design  matrix,  X,  is  simply  an  « -vector  of  ones,  a„(0  takes  the  form 


a^(t)  =  a^(Ri,t)  = 


1  t  <  (Ri-l)/n 

Ri  -  tn  (Ri-l)/n  <  t  <  Rj/n 

0  Ri/n  <  t 


where  i?,  is  the  rank  of  the  /th  observation  and  the  function  a^U 4)  is  exactly  as  introduced 
by  Hajek  and  Sidek[3,  section  3.5]  The  connection  of  ^„(0  to  the  ranks  is  immmediate  from 
the  relation 

n  j&^{t)dt  +^  =  -R.,    /  =  1,...,« 

0  ■' 

Thus,  just  as  solving  the  problem  dual  to  (D)  yields  the  sample  quantiles  or  order  statistics  in 
the  location  model,  solving  (D)  itself  finds  the  ranks,  with  the  aid  of  (3.1).  The  regression 
rank  scores  in  the  linear  model  are  of  course  not  so  simple,  but  they  preserve  several  impor- 
tant characteristics  from  the  location  case.  Sample  paths  {a^(,t):0<t<\}  are  (i)  continuous,  (ii) 
piecewise  linear,  (iii)  <3„(0)==1  and  a„(l)=0,  and  (iv)  0<a„(/)<l.  However  they  are  not  gen- 
erally monotone  decreasing  as  in  the  location  case.  To  illustrate  we  have  computed  a„(/)  for 
the  well-known  stackloss  data  and  the  sample  paths  for  each  of  the  21  observations  appear  as 
Figure  1.  While  several  of  these  figures  have  the  characteriestic  shape  we  might  have 
expected  from  the  location  model,  others  notably  {10,12,16,19}  are  quite  different.  Observa- 
tions like  these  which  have  a  prolonged  transit  from  one  to  zero,  tend  to  be  influential  design 
points.  Indeed  Portnoy  [6]  has  suggested  using  the  length  of  the  interval  for  which  (2„(/)e(0,l) 
as  a  diagnostic  for  "outlyingness". 

Computing  the  Wilcoxon  regression  rank  scores  yields  the  scores  reported  above  each 
panel  and  it  is  clear  from  these  scores  that  the  observations  {4,9,21}  are  unusual  as  is  generally 
recognized  in  other  robust  analyses  of  this  data. 


REFERENCES 

[1]  Gutenbrunner,  C.  and  Jureckova,  J.  (1990).  Regression  quantile  and  regression  rank  score 
process  in  the  linear  model  and  derived  statistics,  to  appear,  Annals  of  Statistics. 

[2]  Gutenbrunner,  C,  Jureckova,  J.,  Koenker,  R.  and  Portnoy,  S.  (1990).  A  new  approach  to 
rank  tests  for  the  linear  model,  in  preparation. 

[3]  Hajek  J.  and  Sidak,  Z.  (1967)  Theory  of  Rank  Tests.  Academia,  Prague. 

[4]  Koenker,  R.W.  and  Bassett,  G.W.  (1978).  Regression  quantiles.  Econometrica.  46,33-50. 

[5]  Koenker,  R.W.  and  d'Orey  (1987).  Computing  Regression  Quantiles,  Applied  Statistics. 
36,  383-393. 

[6]  Portnoy,  S.  (1987).  Using  regression  quantiles  to  identify  outliers.  Statistical  Data 
Analysis  Based  on  the  L^  Norm  and  Related  Methods  (ed:  Y.  Dodge),  North  Holland, 
Amsterdam,  345-356. 

[7]  Portnoy,  S.  (1989).  Asymptotic  behavior  of  the  number  of  regression  quantile  breakpoints, 
forthcoming  in  J.  Sci.  Statist.  Computing. 


Figure  1 
Regression  Rankscores  for  the  Stackloss  Data 
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